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Integration  of  several  vision  modules  is  likely  to  be  one  of  the  keys  to 
the  power  and  robustness  of  the  human  visual  system.  The  problem  of  integrat¬ 
ing  early  vision  cues  is  also  emerging  as  a  central  problem  in  current  computer 
vision  research.  In  this  paper,  we  suggest  that  integration  is  best  performed 
at  the  location  of  discontinuities  in  early  processes,  such  as  discontinuities 
in  image  brightness,  depth,  motion,  texture,  and  color.  Coupled  Markov  Random 
Fields  models,  based  on  Bayes  estimation  techniques,  can  be  used  to  combine 
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Abstract:  Integration  of  several  vision  modules  is  likely  to  be  one  of  the 
keys  to  the  power  and  robustness  of  the  human  visual  system.  The  problem 
of  integrating  early  vision  cues  is  also  emerging  as  a  central  problem  in  cur¬ 
rent  computer  vision  research.  -In  this  paperjwe  suggest  that  integration  is 
best  performed  at  the  location  of  discontinuities  in  early  processes,  such  as 
discontinuities  in  image  brightness,  depth,  motion,  texture  and  color.  Cou¬ 
pled  Markov  Random  Field  models,  based  on  Bayes  estimation  techniques, 
can  be  used  to  combine  vision  modalities  with  their  discontinuities.  These 
models  generate  algorithms  that  map  naturally  onto  parallel  fine-grained  ar¬ 
chitectures  such  as  the  Connection  Machine.  We  derive  a  scheme  to  integrate 
intensity  edges  with  stereo  depth  and  motion  field  information  and  show  re¬ 
sults  on  synthetic  and  natural  images.  The  use  of  intensity  edges  to  integrate 
other  visual  cues  and  to  help  discover  discontinuities  emerges  as  a  general 

and  powerful  principle.  -  - 
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1  Introduction 

One  of  the  keys  to  the  reliability,  flexibility  and  robustness  of  biological  visual 
systems  is  their  ability  to  integrate  several  different  visual  cues.  Early  vision 
processes  such  as  stereo,  motion,  texture,  shading  and  color  give  separate 
cues  to  the  distance  of  three-dimensional  surfaces  from  the  viewer  and  to 
their  material  properties.  Integration  of  the  evidence  provided  separately  by 
these  cues  can  provide  a  more  reliable  map  of  the  surfaces  and  their  properties 
than  any  single  cue  alone. 

Thus  visual  integration  is  likely  to  be  a  key  to  understanding  biological  vi¬ 
sual  systems  and  to  developing  robust  vision  machines.  Existing  methods  do 
not  serin  capable  of  providing  a  general  solution.  Standard  regularization[2] 
provides  a  common  framework  for  many  early  vision  problems  and  leads  to 
the  minimization  of  quadratic  energy  functionals.  If  standard  regularization 
is  used  to  integrate  information  from  different  processes,  the  energy  func¬ 
tional  consists  of  t  he  sum  of  quadratic  parts,  each  associated  with  a  separate 
process.  This  implies  that  the  result  is  a  linear  combination  of  the  different 
cues  (possibly  with  space- varying  coefficients).  Linear  combination  -  say  of 
depth  from  stereo  and  from  shading  -  does  not  seem,  however,  a  flexible 
enough  integration  method.  Even  more  important,  no  instances  of  standard 
regularization  can  handle  discontinuities,  because  the  solution  space  is  re¬ 
stricted  to  generalized  splines[21,2j.  As  we  will  explain  later,  we  believe  that 
detecting  and  representing  discontinuities  (for  instance  depth  discontinuities) 
is  a  key  part  of  the  integration  step[21]. 

To  overcome  these  difficulties  we  have  developed  an  extension  of  regular¬ 
ization  t  hat  promises  to  deal  simultaneously  with  discontinuities  and  with  the 
integration  of  vision  modules.  This  extension  is  based  on  the  use  of  coupled 
Markov  Random  Fields1,  introduced  recently  by  Geman  and  Geman[9]  and 
extended  by  Marroquin,  Mitterand  Poggio[19j.  The  standard  regularization 
method  for  vision  is  a  special  case  of  this  new  approach. 
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1.1  The  Role  of  Discontinuities 

One  of  the  most  important  constraints  for  recovering  surface  properties  is 
that  the  physical  processes  underlying  image  formation  are  typically  smooth: 

’A  different,,  interesting  approacli  has  be  explored  by  Blake[3] 
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depth  and  orientation  of  surfaces  are  mostly  continuous  and  so  are  reflectance 
and  illumination.  The  smoothness  property  is  captured  well  by  standard  reg¬ 
ularization.  Surfaces  and  their  properties,  however,  are  not  always  smooth: 
they  are  smooth  almost  everywhere,  but  not  at  discontinuities.  Lines  of 
discontinuity  are  themselves  usually  continuous,  relatively  smooth,  noninter¬ 
secting  curves.  It  is  critical  to  detect  the  discontinuities  reliably,  because 
they  usually  represent  the  most  important  locations  in  a  scene:  depth  dis¬ 
continuities,  for  instance,  often  correspond  to  the  boundaries  of  an  object 
or  of  a  part.  Furthermore,  discontinuities  play  a  critical  role  in  fusing  in¬ 
formation  from  different  physical  processes.  The  reason  is  clear:  in  smooth 
regions,  the  physical  processes  are  coupled  together  by  the  imaging  equation, 
and  all  contribute  to  image  formation.  However,  the  coupling  is  difficult  to 
know  precisely:  it  depends  on  quantities  such  as  the  form  of  the  reflectance 
function.  The  effects  of  discontinuities  are  instead  robust  and  qualitative:  for 
instance,  depth  discontinuities  usually  correspond  to  intensity  edges.  There¬ 
fore,  discontinuities  are  ideal  places  for  integrating  informal  ion.  Furthermore, 
partial  information  about  discontinuities  in  a  single  process  can  be  detected 
relatively  easily.  Several  types  of  motion  discontinuities,  for  example,  can 
be  measured  with  simple  operations  on  the  time-dependent  intensity  array, 
especially  if  the  interframe  interval  is  small.  Partial  albedo  discontinuities 
also  are  often  detectable  using  simple  operations.  Intensity  edges  are  de¬ 
tected  quite  reliably  by  the  Canny  edge  detector.  However,  the  fast,  rough 
detection  of  discontinuities  performed  by  these  early  operations  is  noisy  and 
incomplete:  it  must  be  refined  by  integrating  them  across  processes  and  by 
exploiting  constraints  on  the  continuity  of  discontinuities. 

In  summary,  discontinuities:  1)  represent  the  most  useful  information,  2) 
are  easy  to  detect  (though  in  a  partial  and  possibly  noisy  way)  and  3)  provide 
good  locations  to  integrate  different  cues. 

1.2  Coupled  Markov  Random  Fields 

Markov  Random  Fields  for  image  modeling  have  seen  increasing  use  since 
the  work  of  Genian  and  Geman[9].  Their  utility  for  image  modeling  de¬ 
rives  from  several  MRF  characteristics.  Mill’s  provide  a  natural  way  to 
impose  general  image  properties  of  smoothness  and  continuity,  for  example 
of  depth  and  motion,  while  also  incorporating  discontinuities.  Hayes’  rule 
establishes  a  relationship  between  the  possibly  corrupted  observed  data  and 
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the  desired  scene  data.  Solution  methods  are  available,  though  often  time 
consuming.  Some  recent  MRF  applications  have  involved  scene  segmentation 
using  depths[18],  texture[6]  and  motion[20]. 

A  Markov  Random  Field  on  a  lattice  can  be  represented  as  a  lattice  of 
sites,  each  one  with  a  random  variable.  The  value  depends  probabilistically 
on  the  value  of  neighboring  sites.  The  rules  governing  this  local  dependence 
can  be  given  in  a  variety  of  ways  and  can  be  made  to  capture  constraints 
such  as  the  continuity  of  a  surface  (if  the  MRF  represents  depth  values). 

Our  idea,  is  to  associate  a  MRF  on  a  lattice  to  each  physical  process  to  be 
integrated  and  another  (binary)  MRF  to  its  discontinuities  (see  figure  1).  The 
lattices  are  coupled  to  each  other  to  reflect  the  interdependence  of  the  corre¬ 
sponding  processes  in  image  formation.  Thus  the  various  MRFs  mirror  the 
different  physical  events  that  underlie  image  formation:  surface  and  surface 
discontinuities,  spectral  albedo  and  albedo  discontinuities,  shadows,  surface 
normal,  and  so  on.  Physical  constraints  apply  to  each  of  these  processes  in¬ 
dependently.  In  addition,  there  are  constraints  between  these  processes  (for 
instance  between  depth  and  surface  normal).  The  image  data  constrain  the 
way  the  processes  combine.  Note  that  consideration  of  sequences  of  images  in 
time  will  introduce  additional  powerful  constraints  such  as  rigidity.  The  con¬ 
straints  on  the  surfaces  are  local  conditions  (such  as  smoothness,  necessary 
mainly  because  of  its  regularizing  role  in  the  face  of  omnipresent  noise)  valid 
everywhere  except  at  discontinuities.  As  we  discussed  earlier,  discontinuities 
are  critically  important  and  should  be  detected  early. 

Notice  that  the  coupling  of  the  line  process  with  the  associated  continuous 
process  provides  a  module  that  combines  region-based  with  boundary-based 
segmentation  (see  figure  1). 

The  local  potentials  underlying  the  a  priori  probability  distribution  of  the 
MRFs  represent  the  constraints  on  the  physical  processes  (smoothness,  posi¬ 
tivity,  values  within  certain  bounds,  etc.):  the  coupling  between  MRFs  repre¬ 
sents  the  compatibility  constraints  between  processes.  The  device  of  coupled 
MRFs  provides  an  ideal  tool  to  impose  local  constraints  such  as  smoothness, 
allowing  at  the  same  time  an  explicit  role  for  discontinuities  through  the  line 
processr s[9]  and  similar  processes  such  as  occ/usions[19].  Our  new  idea  is  to 
incorporate  additional  observable  discont  inuity  data  provider)  by  algorithms 
specialized  to  detect  sharp  changes  in  the  observed  properties  of  intensity, 
mot  ion,  stereo  disparity,  texture,  and  so  on.  The  observable  discontinuities 
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Figure  1:  MRF  lattices  representing  the  output  of  different  early  processes 
and  their  discontinuities  (the  crosses  represent  the  site's  of  the  binary  line 
processes).  Each  representation,  for  instance  dept  h,  is  coupler!  to  its  discon¬ 
tinuities  and  to  other  cues  such  as  intensity  or  motion. 


provide  an  initial  rough  solution  to  the  segmentation  problem.  Using  the 
MRFs  for  estimating  the  fields  gives  increasingly  precise  solutions,  simulta¬ 
neously  filling  in  the  continuous  regions  that  are  only  sparsely  observable. 
The  solution  at  each  iteration  is  available  to  later  modules,  such  as  recogni- 
t  ion. 

1.3  The  Key  Role  of  Intensity  Edges 

One  of  the  results  of  our  integration  work  is  that  intensity  edges  play  pri¬ 
mary  role  in  guiding  the  search  for  discontinuities  in  other  processes  (for 
instance  depth).  The  point  seems  so  important  that  we  would  like  to  phrase 
it  as  a  rather  general  conjecture  on  the  proper  organization  of  the  integration 
stage:  intensity  edges  guide  the  detection  of  discontinuities  in  the  other  phys¬ 
ical  processes,  thereby  coupling  surface  depth,  surface  orientation,  shadows, 
specularities  and  surface  markings  to  the  image  data  and  to  each  other. 

The  reason  for  the  critical  role  of  intensity  edges  is  intuitively  clear  - 
usually  changes  in  surface  properties  (depth,  orientation,  material,  texture) 
produce  large  intensity  gradients  in  the  image.  Under  the  assumption  of 
opacity  and  of  a  simple  imaging  model  (the  reflectance  function  is  assumed 
to  contain  a  lambertian  and  a  specular  term),  there  are  six  physical  causes 
for  large  intensity  gradients  in  the  image:  occluding  edges  ( extremal  edges 
and  blades),  folds,  shadow  edges,  surface  markings  and  specular  edges.  In 
addition,  motion  discontinuities  are  usually  coupled  to  intensity  edges.  It  is 
for  exactly  this  reason  that  edge  detection  is  so  important  in  artificial  and 
probably  also  biological  vision. 

1.4  Plan  of  the  Paper 

In  this  paper  we  introduce  a  method  for  detecting  and  reconstructing  depth 
discontinuities  by  using  the  information  provided  by  intensity  edges.  We  do 
the  same  for  motion  discontinuities.  First  we  introduce  the  Markov  Random 
Field  formalism.  The  use  of  intensity  edges  for  surface  interpolation  is  dis¬ 
cussed  next,  together  with  the  derivation  of  the  associated  MR.F  model.  We 
then  describe  our  Connection  Machine  implementation  and  the  results  on 
synthetic  and  real  data.  Finally  the  discussion  focuses  on  the  open  problems 
and  on  the  implications  of  our  results  for  the  general  problem  of  integrating 
all  vision  modules. 


2  Coupling  Intensity  Edges  with  Sparse  Depth 
Data 

To  illustrate  our  approach  we  consider  the  specific  and  important  problem  of 
computing  an  approximate  surface  and  especially  the  surface  depth  disconti¬ 
nuities  from  sparse  depth  data[10, 25, 18].  The  main  new  idea  here  is  to  exploit 
the  integration  of  additional  vision  cues.  In  particular  we  describe  a  scheme 
in  which  intensity  edges  are  integrated  with  sparse  depth  data.  Sparse  depth 
data  arise  from  the  output  of  feature-based  stereo  algorithms.  Typical  stereo 
algorithms  provide  depth  data  at  a  subset  of  image  features[l5,10.8].  These 
features  might  be  a  Laplacian  filter’s  zero-crossings  from  one  of  the  intensity 
images.  The  depth  information  is  computed  by  measuring  pixel  displace¬ 
ments  (disparity)  between  corresponding  image  features.  As  is  typical  of  all 
known  stereo  algorithms,  the  disparities  are  plagued  by  errors  precisely  at 
depth  discontinuities  where  surfaces  are  usually  occluded. 

The  problem,  then,  is  to  smooth  and  fill  in  the  sparse  depth  data  (i.e., 
reconstruct  the  surface),  while  detecting  the  critically  important  depth  dis¬ 
continuities.  Prior  attempts  at  depth  discontinuity  identification  allowed  the 
discontinuities  to  form  anywhere  in  the  image  provided  the  depth  difference 
between  neighboring  sites  was  significant(18,24].  Due  to  the  sparseness  and 
noise  in  the  depth  data,  the  identified  discontinuities  are:  1 )  offset  from  and 
2)  ragged  or  wiggly  compared  with  the  correct  discontinuities.  These  limita¬ 
tions  become  more  serious  when  the  images  contain  a  large  range  of  depth 
differences,  as  in  natural  images. 

Because  of  the  constraints  on  image  formation  discussed  earlier,  the  cor¬ 
rect  depth  discontinuities  will,  in  almost  all  cases,  correspond  precisely  to  the 
locations  of  intensity  edges.  Our  integration  scheme  exploits  this  by  restrict¬ 
ing  depth  discontinuity  formation  to  a  subset  of  the  intensity  edges.  This 
restriction  ensures  that  the  smoothness  and  continuity  of  discontinuities  can 
be  no  worse  than  the  intensity  edges  themselves.  In  addition,  the  difficult 
problem  of  MRF  parameter  specification  is  simplified  since  this  integration 
scheme  proves  less  sensitive  to  MRF  parameter  variation.-.,  particular!.!  when 
the  depth  data  contain  a  large  range  of  depth  differences. 

There  are  some  cases  in  which  discontinuities  will  not  on  ur  at  intensity 
edges.  Any  object  that  blends  in  with  its  background  presents  su<  h  ,t  case. 
This  situation  occurs  rarely  in  natural  scenes;  yet.  for  prac  tical  reasons  such 
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as  camera  underexposure'  or  saturation,  the  object  may  blend  in  with  the 
background  at  some  locations.  However,  for  these  cases,  the  point  is  some¬ 
what  moot,  since*  without  intensity  edges,  feature-based  stereo  or  motion 
algorithms  will  not  provide*  depth  or  motion  data. 

A  more  genera!  situation  arise*  when  the  features  used  for  stereo  or  mo¬ 
tion  are  dilieren*  from  tin*  discontinuity-  limiting  features.  This  is  desirable 
since  the  continuity  constraints  used  by  stereo  and  motion  algorithms  assume 
that  tin*  features  used  for  matching  are  located  on  surfaces.  Thus  stereo  and 
motion  algorithms  should  use  high  '(-solution,  dense  features  that  identify 
surface*  markings  as  opposed  to  bounding  contours  which  in  general  corre¬ 
spond  to  surface  local *ons  that  are  different  in  the  two  images  of  a  stereo 
pair.  I  he  discontinue y  limiting  features  however  can  be  chosen  to  better 
correspond  to  object  boundaries. 

The  results  section  contains  examples  in  which  the  discontinuities  are 
identified  and  the  surface  recons' ructed  both  with  and  without,  the  benefit 
of  intensity  edge  information.  The  next  section  presents  a  limited  overview 
of  MRF  particulars  and  contains  the  appropriate  MRP  energy  function  for 
integrating  intensity  edges  with,  in  •  his  case,  t  he  sparse  depth  data  produced 
by  a  stereo  algorii  inn. 

3  MRF  Formulation  for  Stereo  and  Inten¬ 
sity  Edge  Coupling 

I  be  theory  of  Markov  Random  Fields  can  be  found  elsewhere[9,17].  We 
present  only  an  overview  here  followed  by  a  description  of  the  energy  func¬ 
tions  used  for  integration. 

The  liammcrsley  Clifford  theorem  states  the  equivalence  between  a  MRF 
and  a  ( ii!>bs  •  list  rd >ut  ion  as  billows.  If  A  is  a  M  If  F  on  a  latl  ice  S  with  respect 
to  the  neighborhood  sys'e’u  (A,  then  l’(.\  ■- u,')  is  given  by: 

/'  A  r  ,,  (  (1) 

/  ; s  a  normalization  factor.  /  i*  ihe  liinp<mlun  and  f  (A)  is  the  niov/i/ 
Junction.  I  lie  tempera1  un*  parameter.  /.  could  be  absorbed  into  (  (A): 
liowev'-r.  when  t  !><*  sol’d  a  ui  mei  !iod  is  (’iseussed.  7  pro\es  useful  as  a  separate 


variable.  The  energy  function  is  of  the  form: 


U(X)  =  '£UciX). 

c 


(2) 


The  sum  of  the  potentials,  l’c(X).  is  over  the  neighborhood's  cliques.  A 
clique  is  either  a  single  lattice  site  or  a  set  of  lattice  site's  such  that  any  two 
site's  belonging  to  it  are  neighbors  of  one  another.  The  funct  ion  /'(  A  =  u>) 
is  called  the  prior  distribution  and  abbreviated  here  by  /'(A). 

The  prior  distribution  on  A',  where  A',  for  example,  might  be  the  recon¬ 
structed  surface,  must  be  determined  based  on  some  observations  or  input 
data,  Y.  To  relate  .V  to  Y  Bayes’  formula  is  used. 


p{*\y)  = 


P(Y\X)P(X) 

P(Y) 


(3) 


The  observations,  Y ,  are  obtained  conceptually  by  degrading  A’,  such  as  by 
the  addition  of  noise  or  blurring.  If  the  type  of  degradation  is  known,  the 
distribution  P(V'|.Y),  can  be  computed.  Marroquin[17]  has  shown  that  for 
the  case  of  zero-mean  white  Gaussian  noise,  /■>( V* | A’ )  is  a  Gibbs  distribution 
with  potential: 


U(Y\X)  =  ^U,(Y\Xy,  l',(Y\X)  =  -a7, (Xi  -  y,)2.  (4) 

>€S 

The  sum  is  over  all  lattice  sites  and 

J  1,  if  input  data  exists  at  lattice  site  i 
|  0,  otherwise. 


When  this  result  for  P(Y | A  )  is  combined  with  the  MKF  prior  distribution. 
P(X),  and  Bayes'  rule  the  a  posteriori  distribution  P(  A"  |  T )  is: 

P(X\Y)  =  ^xp{  — i^^(A'jV)|  (6) 

for  U,(  A  |}  )  =  1 1,(  A  )  -f  l /,(  )  |  A  )  and  wit  h  /  a  normal  i/at  ion  constant  inde¬ 
pendent  of  A.  This  a  /jos/l non  distribution  provides  the  likelihoods  for  all 
possible  states  X,  given  the  observable  data  V. 

Given  the  posterior  distribution  P(Xj}  )  and  the  i.rle  rnetl  field  )  the  do 
sired  field  Y  can  be  retrieved  once  a  suitable  error  criterion  is  specified.  The 


Maximizer  of  the  Posterior  Mam  (MPM)  reduces  the  problem  of  annealing 
and  has  been  successful]}-  applied  for  our  results.  With  the  criterion  specified, 
the  relaxation  algorithm  for  solution  is  largely  determined.  The  question  of 
a  suitable  error  criterion  and  algorithmic  consequences  has  been  thoroughly 

discussed  bv  Marroqnin  17;. 

The  problem  lias  now  become  one  ot  specifying  the  MKT  potentials. 
f.\(X)  and  tq(V’I.Y).  The  potentials  impose  the  physical  constraints  of  con- 
tinuitv  and  smoothness  of  surfaces  (except  at  depth  discontinuities)  along 
with  continuity  and  smoothness  of  depth  discontinuities.  These  constraints 
a  re  imposed  by  tailoring  tin  energy  function  to  minimize  the  energy  (maxi¬ 
mize  the  probability)  when  the  state  occupied  satisfies  tin'  desired  physical 
constraints.  Typically  this  choice  is  empirical  although  one  might  envisage 
estimating  the  prior  associated  with,  for  instance,  depth  smoothness  from  a 
specific  class  of  surface  data. 

The  MRF  state  space  used  herein  is  similar  to  that  ofCernan  and  Geman[9] 
along  with  Marroquin[17]  where  each  lattice  site'  is  composed  of  a  depth  pro¬ 
cess  and  two  line  processes,  X  =  {F.L\.  The  depth  process,  F,  is  a  con¬ 
tinuous  random  variable  whose  value  is  related  to  the  distance  of  a  surface 
point  from  the  observer.  The  value  of  F  at  site  i  is  denoted  as  /,-  where 
—  oo  <  /,•  <  co.  The  depth  process  neighborhood  system  to  site  i  consists 
of  the  four  nearest  neighbors:  east,  south,  west  and  north,  to  i.  Although 
a  continuous  random  variable  should  not  be  updated  using  the  Heat  Bath 
algorithm,  the  depth  process  can  be  deterministically  updated[17j,  provided 
the  MR  I-  energy  is  suitably  defined.  Figure  2  illustrates  the  MRF  lattice 
with  the  depth  and  line  processes. 

The  line  process  used  here,  /,.  contains  a  vertical  and  horizontal  orien¬ 
tation  that  are  conceptually  located  between  lattice  sites.  The  vertical  line 
process  is  located  bet  ween  its  lattice  site  and  the  neighboring  eastern  lat¬ 
tice  site,  whereas  the  horizontal  line  process  separates  its  lattice  site  and 
t  he  nearest  southern  la"  ice  site.  Kacli  orientation  is  a  binary  random  field, 
F  •=:  {().  !  }  where  the  scripts  on  !\  denote  the  line  process  that  separates 
lattice  siie  i  from  /.  The  horizontal  line  process  at  site  i  is  denoted  as  If. 
the  line  process  w  If  Smoothing  o|  the  depth  process  is  inhibited 

when  the  line  sta"-  is  /•!>.  /'  —  1.  since  smoothing  should  not  occur  across 
depth  discont  limit  i'-s:  otherwise,  depth  process  smoothing  is  performed.  An 
on  state  'bgnilies  the  presenri  o!  a  depth  discontinuity.  The  conditions  for 


Figure  2:  (a)  A  lattice  site  is  composed  ol  a  single  depth  process  (illustrated 
with  a  circle)  along  with  a  vertical  and  a  horizontal  line  process.  The  MRF 
Lattice  consists  of  a  rectangular  grid  of  these  lattice  sites,  (b)  The  neigh¬ 
borhood  for  the  depth  process  and  the  vertical  line  process  neighborhood. 
The  black  dot  in  the  line  process  neighborhood  indicates  the  la  It  ice  site  lor 
this  neighborhood,  (c)  The  five  maximal  cliques  (north,  oast,  south,  west 
and  central)  for  the  vertical  line  process  are  shown,  in  this  paper  we  only 
consider  configurations  ol  the  contra!  clique.  I  his  is  equivalent  to  assigning 
zero  energies  to  all  conligurat ions  of  1  ho  other  four  cliques 
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depth  discontinuity  formation  arc  encapsulated  in  the  MRF  energy  function 
present  ed  subsequent  ly. 

The  external  fields  to  the  MRF  are  the  sparse  depth  information  and  the 
intensity  edges.  The  sparse  depths,  (I.  are  represented  by  two  variables,  gx 
and  7,  for  site  >  The  value  g,  F  analogous  to  /,;  it  is  continuously  valued 
over  the  real  numbers,  although  in  practice,  since  </,  is  provided  by  stereo 
output,  it  is  discrete.  1  he  variable  encodes  the  sparseness  of  the  stereo 
output  and  is  defined  as  in  conation  ">. 

1  he  intensity  edges  are  rcpiesc;  ted  by  the  field.  E.  I  1 1 is  field  is  similar  to 
the  hue  process.  /. .  except  that  <  •’  -  ’.  rather  than  indicating  the  presence  of 
a  depth  db<  ontinuity.  pf runts  tin  formation  of  a  depth  discontinuity  between 
lattice  site  i  and  neighbor  j.  The  MRF  energy  is  designed  so  that  t\  =  0 
implies  (in  the  present  implementation)  I1,  =  0  for  all  i,j  £  S.  An  edge 
detector,  such  as  ( 'anny's[-l],  will  mark  a  sift  i  as  an  edge,  but  e.\  marks 
potential  discontinuities  hitwan  silrs  i  and  j.  To  resolve  this  ambiguity,  if 
an  edge  is  at  site  /,  then  <’■  —  1  where  k  is  ('ach  of  the  nearest  neighbors  to 
site  i.  This  intensity  edge  field.  If  along  with  (!  comprise  the  MRF  external 
Il« del  V  such  that  )  —  {Cl.  II}. 

Given  the  external  fields.  V.  and  the  random  variables,  A’,  equation  6 
provides  the  posterior  distribution  with  the  MRF  energy  given  as 

/  (.,•!„)  V  /  „(,•!//) 

I 

-■  07,1/,  -  g,)2  4-  Yi  (1  -  /;')(/.  -  /.,)*'  + 

J  6  >m 

£  +  (7) 

Ihe  first,  tenn  in  this  equation  is  i  he  coupling  between  the  depth  process 
and  the  sparse  and  noisy  input  data.  I  he  coupling  factor,  o,  is  related  to  the 
noise  in  </.  for  noiseless  data,  o  — •  oo  thereby  ensuring  /,  =  g,.  Otherwise, 
when  o  =  0  no  input  data  coupling  occurs  and  /  is  smoothed  by  the  term 
involving  (/,  —  /;)2  in  equation  7.  The  precise  relation  between  o  and  the 
noise  depends  on  the  noh"  model  assumed,  lor  a  model  of  measurement 
that  includes  Gam.-ian  landom  noise 


n  ~  — ; 
n1 


V.V-V.V.V. 


r.  •c. 


where  a  is  the  gaussian’s  half  width  at  half  niaximum[l 7].  Note  that  if  the 
noise  model’s  parameters  vary  locally,  it  might  be  appropriate  to  vary  a 
locally  as 


Local  variation  in  noise  parameters  does  occur  in  the  stereo  algorithm  of 
Drumheller  and  Poggio[7];  this  variation  is  reflected  in  the  stereo  match  scores 
of  that  algorithm.  The  present  paper  does  not  address  this  issue;  here  we 
keep  a  constant,  usually  in  the  range  0.1  to  2.0.  The  input  data  coupling 
to  /  occurs  when  y  —  1.  Typically  5  to  10%  of  the  lattice  sites  have  input 
depths  associated  with  them. 

The  last  term  in  equation  7  implements  the  integration  scheme  between 
sparse  stereo  depths  and  intensity  edges.  The  term  forbids  depth  discontinu¬ 
ity  formation  except  where  an  external  edge  exists.  Discont  inuity  formation  is 
prevented  by  letting  /?'  — >  oo.  When  /;  =  1  and  =  0,  this  term  cont  ributes 
a  large  energy,  U,(x\y)  — >  oo  and  the  associated  probability  for  —  1  is  zero. 
At  sites  where  t\  —  1  this  energy  term  contributes  nothing  and  the  depth 
discontinuity  formation  is  determined  by  the  other  factors  in  equation  7.  The 
problems  of  misalignment  might  be  handled  by  suitably  modifying  this  term 
in  the  energy  U,(x\y)  to  produce  a  it  cone  of  influence  or,  for  a  simple  case, 
by  “thickening”  the  input  intensity  edges.  For  instance,  we  may  use  instead 
of  ej  in  equation  8,  e2  *  (7,  where  *  denotes  convolution  and  G  is  a  gaussian 
or  another  appropriate  cone  of  influence  funct  ion.  The  results  presented  in 
this  paper  do  not  utilize  a  cone  of  influence. 

The  second  and  third  terms  in  equation  (7)  encapsulate  our  prior  expec¬ 
tations  concerning  depth  discontinuities  and  surface  reconstruction.  They 
compose  the  potential  U( X)  of  the  prior  distribution  (equation  1).  These 
two  terms  ‘compete’  in  the  sense  that  turning  on  a  line  costs  energy  rfUril-) 
but  saves  energy  (/,  —  f 3)2 .  The  interplay  of  these  two  potentials  largely 
determines  the  formation  of  depth  discontinuities  where  -  1.  The  second 
term  couples  the  line  and  depth  processes,  the  third  term  determines  the 
line  process  clique  energy.  This  line  and  depth  process  coupling  is  summed 
over  the  nearest  neighbors,  nn,  to  site  /.  with  each  neighbor  contributing  an 
energy  (/,  -  /j)2  when  /2  =  0. 

The  quadratic  term,  (/,—  fj)2,  tends  to  smooth  the  depth  process  since  it 
is  minimized  when  f,  —  f ).  Depth  discontinuities  have  a  higher  probability 
of  forming  when  the  energy  to  create  a  line.  ,il  is  less  than  this  energy 


to  smooth  the  depths.  The  factor  (3  is  a  free  parameter  that  determines  what 
size  depth  difference  is  likely  to  produce  a  depth  discontinuity.  Specification 
of  j3  is  largely  image  dependent  and,  although  a  suitable  range  has  been 
determined,  a  general  theory  specifying  f3  remains  elusive.  The  line  process 
clique  energy  will  be  examined  in  detail  later. 

I  ixe  Heat  Bath  algorithm  cannot  be  simply  applied  to  equation  7  since 
the  j\  are  continuous  variables.  Instead  we  employ  a  technique  to  smooth 
the  depth  process  deterministically,  but  to  update  the  line  process  stochas¬ 
tically  with  the  Ileal  Bath  algoiithm[17].  With  the  line  process  state  fixed, 
the  MRF  energy  of  equation  7  is  non-negative  definite  quadratic  with  a  sta¬ 
ble  and  unique  fixed  point  for  the  /,•  (practically,  /?'  never  contributes  since 
the  configuration  e-  =  U  and  lj  —  \  has  a  vanishing  probability).  In  this 
situation,  the  depth  process  can  be  smoothed  deteministically  to  find  the 
fixed  point.  After  this  fixed  point  in  depth  is  determined,  the  line  process  is 
stochastically  updated,  the  new  fixed  point  in  depth  is  determined  and  the 
scheme  is  repeated. 

Once  the  line  process  approaches  equilibrium  (roughly  1000  iterations), 
statistics  are  gathered  to  compute  the  MPM  estimate.  The  MPM  estimate  is 
computed  from  P{1\  =  1)  =  where  n  is  the  number  of  iterations  over 

which  statistics  are  gathered[17].  When  P(lj  =  1)  >  (0.5  +  l/\/n),  statistical 
fluctuations  about  0.5  are  reduced  and  the  MPM  estimate  is  turned  on  to 
mark  a  discontinuity.  Use  of  the  MPM  estimate  does  not  require  annealing 
but  the  a  posteriori  distribution’s  coupling  parameters  must  produce  a  rea¬ 
sonable  amount  of  line  process  agitation  thereby  sampling  much  of  the  line 
process  sample  space. 

3.1  Choice  of  Line  Clique  Energies 

Figure  2  shows  the  line  process  neighborhood  for  the  vertical  line  process. 
Oi  the  five  cliques  shown  for  this  neigh borhood,  only  the  clique  centered 
about  the  vertical  lattice  site  has,  by  design,  a  non-zero  potential 
This  potential  depends  on  the  256  possible  configurations  associated  with 
the  clique.  The  desirable  configurations  are  a  small  subset  of  all  possible 
configurations  and  they  impose  the  constraints  of  smoothness  anil  continuity 
on  the  depth  discontinuities.  These  constraints  are  embodied  in  the  following 
five  heuristics  which  divide  the  desirable  configurations  into  classes : 


Lina  Creation  Straight  Angled 
Lina  Growth 


•  —  * ,  •  —  * 

*  I 

Cornered  Straight  Angled 
Lina  Completion 


.1.  .1. 


Cornered  Straight  Angled 


Tee  Completion 


Figure  3:  The  four  classes  of  non-forbidden  line  configurations  for  the  verti¬ 
cal  line  process.  A  dot,  represents  an  off  state;  on  states  are  shown  with  . 

their  oriented  lines.  The  symmetry  operations  producing  the  other  allowed 
configurations  are  discussed  in  the  text.  The  horizontal  line  process  configu¬ 
rations  are  identical  provided  the  vertical  line  process  cliques  are  rotated  by 
90  degrees. 

•  Turn  on  a  lone  site  provided  a  ‘large’  depth  discontinuity  is  present 
[Line  Creation], 

•  Turn  on  a  site  extending  an  already  present  line  segment  even  if  the 
depth  discontinuity  is  ‘small’  [Line  Growth], 

•  Always  turn  on  a  site  if  doing  so  would  connect  two  line  segments  [ Line 
Completion ]. 

•  Allow  tees  to  occur  infrequently  where  supported  by  at  least  a  ‘small 
depth  discontinuity  [7’ee  Completion J. 

•  All  other  configurations  should  occur  rarely  if  at  all  [Forbiddt  //]. 


Examples  of  the  first  four  classes  are  shown  in  figure  3.  In  addition 
to  these  configurations,  three  symmetry  operations  produce  the  other  non- 


forbidden  classes.  These  symmetry  operations  are:  rotation  by  180  degrees 
about  an  axis  perpendicular  to  the  page,  reflection  about  the  vertical  axis  (for 
the  vertical  line  process  orientation)  and  the  180  degree  rotation  followed  by 
the  reflection  operation.  With  these  symmetry  operations  and  clique  classes, 
a  total  of  22  unique  configurations  are  allowed  from  the  original  set  of  256. 
When  /•'  =  0  (line  is  olf),  the  clique  potential  is  0.  However,  when  l"  =  1,  the 
clique  energy  is  determined  by  the  five  classes;  this  is  the  energy  required  to 
turn  on  the  line. 

The  line  process  clique  considered  here  is  only  one  of  the  cliques  associ¬ 
ated  with  the  neighborhood  shown  in  figure  2.  In  previous  work[9,17],  the 
smaller  neighborhood  did  not  readily  produce  lines  of  any  orientation;  the 
cliques  tended  to  create  vertical  or  horizontal  line  segments.  The  ‘large’ 
neighborhood  used  here  (though  incompletely,  because  we  assign  zero  en¬ 
ergies  to  several  cliques),  does  encourage  isotropic  line  formation  without 
exacting  too  high  a  computational  penalty. 

4  Stereo  and  Synthetic  Image  Results 

The  MRF  scheme  for  coupling  intensity  edges  to  sparse  stereo  depth  data 
has  been  implemented  on  a  Connection  Machine[ll].  The  sparse  depth  data 
and  intensity  images  from  both  real  steieo  and  synthetic  images  have  been 
examined.  This  section  presents  these  image  results  for  some  typical  images. 

4,1  Connection  Machine  Implementation 

The  Connection  Machine  (CM)  is  a  fine-grained  parallel  computer  manufac¬ 
tured  by  Thinking  Machines  Corporation.  We  used  their  CM-1  model  with 
16k  processors.  Each  processor  is  connected  to  its  four  nearest  neighbors 
(north,  east,  south  and  west)  in  a  two-dimensional  grid,  the  NEWS  network, 
and  each  16  processor  group  is  connected  to  a  12-dimensional  hypercube,  the 
Router.  These  two  communication  modes  allow  fast  access  between  neigh¬ 
boring  processors  and  logarithmic-time  access  between  any  two  processors. 
Each  processor  is  a  simple  1-bit  processor  with  1  kilobits  of  memory.  All 
processors  execute  a  single  instruction  stream.  The  CM  was  configured  to 
match  the  image  size.  256  x  256.  by  using  virtual  processors. 


For  the  MRF  implementation  each  CM  processor  represents  an  MRF  lat¬ 
tice  site.  This  configuration  proves  ideal  for  implementing  the  MRF  cliques 
over  the  CM  NEWS  network.  The  limited  number  of  non-forbidden  line 
clique  states  and  energies  are  stored  in  tabular  form  at  each  processor.  De¬ 
termination  of  the  line  clique  state  requires  access  to  the  four  nearest  neigh¬ 
bors  plus  the  north-east  (south-west)  neighbor  for  the  vertical  (horizontal) 
orientation.  At  the  image  borders,  the  line  processes  are  always  on.  thereby 
conveniently  preventing  depth  process  smoothing  beyond  the  borders. 

The  MRF  input  data  was  obtained  from  two  previously  implemented 
CM-1  algorithms.  For  the  real  stereo  depth  data,  MIT’s  Eye-Head  system 
provided  the  stereo  pair  and  the  Drumheller-Poggio  CM-1  stereo  algorit  hrn[8j 
produced  the  disparity  data  at  a  subset  of  DOG  zero-crossing  feature's.  The 
intensity  edges  came  from  Todd  Cass'  [13]  implementation  of  Canny’s  edge 
detector.  These  edges  do  not  coincide  with  the  stereo  algorithm  features. 

When  synthetic  data  was  used,  the  image  depths  were  produced  bv  the 
TMC  3-D  Toolkit  as  was  a  dense  depth  map.  A  sparse  map  was  obtained 
by  randomly  discarding  90  to  9o  percent  of  the  depth  values.  Uniformly 
distributed  random  noise  was  added  to  the  synthetic  sparse  depth  data. 

The  initial  line  process  state  is  set  to  mimic  the  intensity  edge  map  as  pro¬ 
vided  by  the  Canny  edge  detection  stage.  The  MRF  depth  values  are  created 
by  using  the  sparse  input  depths  to  "brush  tire  fill”  and  then  by  determin¬ 
istically  smoothing  the  depth  values.  During  the  deterministic  smoot  hing  of 
the  initial  depth  process,  the  depth  external  field  coupling,  o.  is  infinite. 


4.2  Results 

Figure  4  shows  the  MRF  results  on  a  synthetic  image  for  two  intensity  edge 
coupling  schemes.  In  the  first  scheme,  intensity  edges  arc-  not  used  in  the 
MRF’  process.  This  allows  depth  discontinuities  to  form  anywhere  and  is 
achieved  by  setting  t\  =  1  for  all  i,j  6  b'.  The  upper  left  image  shows  the 
synthetic  scene  from  which  the  sparse  depth  data  was  derived.  The  lower 
left  image  in  F’igure  4  illustrates  the  depth  discontinuities  identified  with  the 
MPM  estimate  of  the  MRF  process.  When  the  depths  vary  rapidly,  many 
closely  spaced  discontinuities  arc  formed.  These  discoid  unfit  it's  are  ragged 
and  also  displaced  from  the  ac  tual  object  houndaiies  (as  marked  bvinlensitv 
edges).  The  reconstructed  depth  surface  is  not  shown. 

The  second  scheme  strongly  penalizes  depth  discontinuity  formation  ov- 
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cry  where  except  at  the  intensity  edges  shown  in  the  upper  right  image  of 
Figure  1.  The  externa!  field,  t  equals  one  only  at  the  intensity  edge's  pixels. 
The  depth  discontinuities  found  are  shown  on  the  lower  right  of  Figure  4. 
Nearly  all  the  intensity  edges  due  to  surface  orientation  and  texture  are 
eliminated.  In  some  places,  such  as  near  the  geodesic  sphere’s  boundary,  the 
surface  slope  alone  is  large  enough  to  yield  a  depth  discontinuity. 

Another  representative  image* -this  time  a  real  image  is  shown  in  Figure  5 
where  a  stereo  algorithm  produced  the  sparse  depth  data.  The  right  image 
from  the  stereo  pair  appears  on  the  upper  left  of  Figure  5.  This  scene  c  onsists 
of  a  tall  stack  of  newspapers  and  a  small  box  or  carton.  The  stereo  depth 
data  and  tiie  reconstructed  surface  are  not  shown.  Once  again  we  consider 
two  cases,  depending  on  whether  or  not  the  intensity  edges  are  utilized. 
Without  the  intensity  edges,  as  with  the  synthetic  stereo  results,  the  depth 
discontinuities  are  poorly  posit  ioned  and  ragged.  However,  with  the  intensity 
edges  (upper  right  of  Figure*  5),  the  discontinuities  on  the  lower  right  agree 
reasonably  well  with  the  object  boundaries. 

For  these  stereo  image  results,  a  few  difficulties  are  worth  mentioning. 
A  large  depth  discontinuity  along  the  top  left  of  the  newspaper  boundary 
is  not  found.  The  stereo  algorithm  produced  very  poor  depth  data  at  this 
location  and  positioned  the  depth  change  roughly  5  pixels  above  the  news¬ 
paper  intensity  edge  used  by  the  MllF  process.  Also  the  small  box's  shadow 
yielded  a  small  disparity  that  created  a  depth  discontinuity.  The  box  itself 
also  had  a  small  disparity  s()  that,  modifying  MRF  parameters  to  eliminate 
t  he  shadow  discont  inuity  would  have  eliminated  the*  box’s  discontinuity.  This 
sort  of  variability  is  inevitable  until  a  reasonable  method  for  local  parameter 
es*  imai  ion  is  developed. 

Sit  nations  <an  arise  wherein  discont  inuity  detection  is  hampered  when  the 
intensity  edge  sites  do  not  coincid'*  with  the  sites  at  which  external  depth 
da  i  a  are  piovided.  Figure  h  displays  a  possibility  where  a  depth  discont  inuity 
should  form  h»*f ween  featiii<*s  \  !  and  A  J  inclusive.  However,  the  discon 
* iinit \  o.s ; ■ « t ■  1  \  hum  on  the  intensity  edge  at  B-l  and.  because  of  depth 
filling  and  smoothing,  t  lie  discont  inuity  may  he  wash  a/  out.  The  washing 
out  depend*-  prmiarilv  on  the  depth  difference,  t  he  separat  ion  between  edges 
\  1  ite:  \  ~  and  t  !;<•  -moot mug  parameters.  II  edge  H- 1  were  <»n  A-l  or 
\  *  hen  'he  d  s<e-  ' ;  hi  ;  1 1  y  eoo.d  form  r-adily.  One  approach  to  avoid  this 

coincide;-.  <•  pn.h'ere  >  to  project  a  cute  el  influence  about  t  he  i i it  ensit  v  edge 
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8t«r«o  Disparities  Exist 


m  —  —  Intensity  Edge 


Figure  6:  The  disparities  at  edges  A-l  and  A -2  suggest  that  a  depth  discon¬ 
tinuity  should  be  formed  somewhere  between  A-l  and  A-2.  Yet,  because  of 
depth  process  smoothing,  the  depth  difference  at  intensity  edge  13-1  may  be 
too  small  to  support  a  discontinuity.  No  discontinuity  will  form  due  to  this 
‘misalignment’  of  edges. 

location.  Then  the  discontinuities  could  form  not  only  at  the  intensity  edges 
but  also  for  one,  two  or  more  pixels  on  either  side  of  the  edge.  This  has 
the  disadvantage  of  leading  to  somewhat  poorly  localized  and  ragged  edges. 
Straightness  of  the  resulting  line  process  is  enforced  locally  by  the  intrinsic 
prior  of  the  line  process  when  the  cone  of  influence  is  no  larger  than  the 
line  process  neighborhood.  Another  approach,  used  here,  was  to  avoid  the 
washing  out  by  an  appropriate  selection  of  the  coupling  parameters.  More 
work  must  be  done  in  this  area. 

5  Coupling  Intensity  Edges  to  Sparse  Mo¬ 
tion  Data 

I  i if'  simplicity  of  limiting  discontinuities  to  a  subset  of  intensity  edges  im¬ 
mediately  suggests  its  use  for  other  vision  modules.  1'lie  same  principles 
employed  for  the  stereo  depth  application  have  been  utilized  on  motion  data 
As  with  depths,  motion  fields  both  from  synthetic  data  and  a  feature-based 
motion  algorithm  have  been  used  to  identify  motion  discontinuities  and  to 
smooth  and  fill  the  sparse  motion  field.  T  he  difference  is  that  motion  is  a 


vector  field;  dept li  is  not . 

The  MR  I  energy  of  equation  7  is  modified  by  replacing  the  random  field 
variable.  /•'.  by  a  vector  random  field.  M.  Likewise,  the  external  field,  G, 
I m 'coni' ‘s  a  veelor  field.  A  Lite  Mb’!  energv  is; 


,1  ./•:// ! 


m,M/,  -  \,\2  4-  Y1  n  -  K)\Mt  -  A/j1'2-*- 
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where  .1/  =  vrx  +  vcy  wit!;  a  similar  definition  for  A  and  where  | Mt  —  Mj |2  = 
(u,  —  +  ( v,  —  Vj)2.  Tlie  input  field  N  contains  the  two  components  of 

the  optical  flow;  the  output  is  M  or  equivalently,  ( m,-,v ,)  for  all  lattice  sites  i. 
With  this  energy  formulation,  motion  field  direction  discontinuities  are  not 
identified,  only  magnitude  discontinuities  are  marked. 

A  specialized  motion  algorithm,  such  as  Horn  and  Schunk‘s[12],  can  be 
used  to  compute  the  motion  field  for  input  to  the  MRF.  The  motion  data 
employed  here  derive  from  a  parallel  algorithm^  1]  that  provides  match  scores 
much  like  the  previously  used  stereo  algorithm.  Match  scores  provide  a  local 
measure  of  trust  for  the  motion  data  but  are  not  utilized  here  Rather  than 
splitting  the  problem  info  early  and  middle  vision  parcels,  an  alternative 
approach  uses  the  MRF  machinery  to  compute  the  motion  field  in  addition 
to  segmenting  he  imagesl'20]. 

Figure  7  illustrates  some  results  on  a  simple  synthetic  motion  sequence. 
Tim  image  contains  a  white  square  with  a  small  grey  texture  marking  moving 
diagonally  across  a  grey  and  black  background.  The  motion  field  is  non¬ 
zero  only  on  the  white  square  and  its  texture  marking  where  both  .r  and 
//  components  exist.  Roughly  off  of  the  image  motion  data  is  input,  to  the 
MR  F.  The  bottom  half  of  figure  7  shows  t  he  mol  ion  discontinuit  ies  idenl  died 
both  with  and  without  intensity  edge  information.  Again,  t he  intensity  edges 
sfimifieanl  Iv  enhance  the  localization  o!  "nice  motion  discont  imiit  ies. 
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data  exists  at  milv  5  percent  ot  tin1  image  pixel-. 


6  Discussion 


6.1  Central  role  of  intensity  edges 

I  1 1 >•'  results  presented  here  support  the  idea  that  intensity  edges  can  be  used 
a.-  i  he  primary  cue  to  help  detect .  ■  .  enpVt r  and  pn<  ivly  locate  the  discoti 
tiuuities  m  the  other  processes  su'd,  as  depth,  motion,  texture  and  color.  As 
we  mentioned  earlier,  the  reason  for  this  is  that  discontinuities  in  depth,  sur¬ 
face  orientation,  motion,  texture  and  <•<>  or  typical!',  originate  large  gradients 
in  the  image  intensity,  i  e  edges.  !'.  x*  ure  boundaries,  for  instance,  can  be 
synthesized  wit. hou*  any  tnten:-  Py  <  dye:  d  is  sidfa  ien!  ’•  ,<«ok  ;i"iiitn!  to  eon 
vitice  ourselves  that  it:  1  he  retd  .vot'd  mo.-t  of  t  h<  text  ure  boundaries  , ..  <  ur 
together  with  an  intensity  edge.  !  he  -auie  is  Mu.-  for  motion  discontinuities. 
Color  boundaries  also  corns,  pond  t , ,  t.rigp  :,<«,*  boundar--  Idsolutninan*  bor 
ders  exist  only  in  the*  psychophys;<  laid  e  In  addition  intensity  edges  <  an  be 
better  localized  than  motion.  depth.  teVun  and  (  olor  diseotit  mint  ie-.  I  he 
ease  of  texture  is  esixs  tab  v  •  t  •  •  !.e  u  '  n ;t  v  m  the  Incat  ion  of  t  ext  u  re 
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6.2  Open  problems  in  the  approach 

I  :  e  pi ii •  \  o  "•‘•;tit:e.  mien  rat  mg  i nt eris i t y  ed r '  it  h  depth 

ar  .'  oe  d,it.  epi  ..  ' t .*■■■_■ .  a-  the  figures  show.  There  are.  however, 
m.. n\  o'"’  answered  before  oui  -v  can  lie 
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(•IK'S.  There  are  also  more  specific  questions  about  our  tecinumie  ol  visual 
integration  and  discontinuity  detection. 

6.2.1  The  Structure  of  Visual  Integration 

The  scheme  sketched  in  figure  8  is  a  preliminary  suggestion  for  the  struc¬ 
ture  of  visual  integration.  It  is  close  in  spirit  to  the  ideas  about  intrinsic 
images  proposed  by  Barrow  and  Tennenbaum(l).  1  hey  did  not.  however, 
have  the  powerful  theory  of  coupled  MRI  models  to  implement  their  ideas. 

Information  about  the  image  intensity  has  a  primary  role  intensity  edges 
help  the  line  processes  associated  with  color,  texture,  motion  and  depth. 
Depth  itself  has  also  a  special  role  in  a  sense,  it  is  the  main  output  of 
the  whole  system.  Motion,  texture  and  color  ar<-  coupied  to  depth.  They 
may  not  be  directly  coupled  to  each  other.  Notice  that  thi  mam  couplings 
an  through  the  liar  prnr(  axes,  according  to  the  principles  outlined  in  the 
introduction.  Notice  also  that  local  estimates  of  reliability  may  be  used  to 
control  locally  tin*  strength  of  the  coupling:  we  have  seen  earlier  that  in  the 
M  K  b  model  the  coupling  between  depth  and  its  discontinuities  is  controlled 
to  t  lie  parameter  o  which  is  inversely  proportions.  \>  C. 

I  lie  line  processes  may  receive  da!  a  from  early  algorithms  .it  this  point  it 
is  an  open  question  how.  In  the  present  implementation  the  intensity  edges 
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Figure  8:  The  organization  of  the  integration  stage.  Each  of  the  processes  is 
coupled  lo  its  line  process.  Intensity  data  feed  into  the  motion,  color,  texture 
and  depth  line  processes  The  line  processes  are  not  hidden  processes:  they 
may  also  receive  data  from  specialized  discontinuity  detectors.  The  intensity 
line  process  gets  input  data  from  Canny  edges.  It  is  coupled  to  a  higher  level 
field  which  implements  constraints  of  line  continuation  and  rollinearity  on  a 
more  global  basis  than  the  neighborhood  system  of  the  line  process.  The  line 
process  associated  with  the  dept  h  process  is  also  coupled  to  a  higher  level 
field  which  implements  the  appropriate  constraints  underlying  occlusions  of 
surfaces.  The  plausibility  of  interactions  between  motion,  texture  and  color 
is  an  opeii  quest  ion. 
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6.2.2  Detailed  Questions 


( )t  h»‘r  <)[)('  !i  ({lies  t  tons  are:  mh  prat  am  oj  ai/i/if  auiaf  r  is  mil  cm  s.  loci  I  cs.  </l,>liai 
constraint*  on  ilu  Inn  proa  ss.  loh  nnia  in  n  <) i*t rat ion .  mull irt  solut ion  fa  hi--, 
appro.!'  mint  ici  ahjonthins  am/  in  lira  I  nil  ph  tin  a  tat /oils  and  h  arm  up  of  pa  i  a  in  - 
>  h  i  s  from  txampli  s. 

Integration  of  additional  visual  cues  As  figure  *  shows.  we  plan  (<>  inte¬ 
grate  other  visual  cues  with  stereo,  motion  and  intensity  data.  In  particular, 
we  will  include  text  tire  and  color.  Because  text  ure  boundaries  ii'ualh  depend 
on  changes  of  material  or  sharp  changes  in  surface  orientation,  (lies  could 
he  used  to  support  the  line  processes  in  the  depth  and  motion  modules,  for 
color  the  goal  is  to  find  boundaries  that  delineate  region'  of  constant  albedo 
(at  a  coarse  resolution,  since  small  surface  markings  should  not  be  "seen"  at 
this  stage).  As  in  t  he  case  of  dept  li  and  mot  ion.  intensity  edges  play  a  erit  ical 
role  for  these  two  additional  visual  modules.  Ilurlbert  and  Poggio  ■  see  21  j 
have  sketched  a  possible  scheme  for  coupling  albedo  with  intensity  edge-. 

It  is  important  to  notice  that  the  combination  ot  scut  a!  ■. ;  -  j ;  • ! 
only  allows  reinforcement  of  evidence  for.  say.  a  depth  discontinuity,  but  also 
achieves  a  classification  of  an  intensity  edge  in  lemo  of  it  s  underlying  physical 
cause:  for  instance,  whether  it  is  due  to  a  shadow  or  a  depth  discontinuity. 
Clearly,  psychophysics  can  give  useful  indications  of  which  interactions  are 
important  in  the  human  visual  system. 

Local  versus  global  constraints  on  the  line  process  The  line  process 
provides  a  means  for  imposing  important  physical  con-i  rami  >  on  t  he  disconti¬ 
nuities  such  as:  continuity,  relative  spatial  isolation  and  possibly  <  ollinearit y. 
These  constraints  are  enforced  hy  using  appropriate  clitpics  ami  associated 
i  in  njij  values.  However,  in  our  experience  with  Markov  Random  field  mod 
els  applied  to  real  data,  a  problem  has  emerged  with  the  u-e  of  the  line 
process.  In  many  atscs  the  property  of  <ollmoaritv  that  <  an  he  enforced  in 
this  way  remains  too  local:  discontinuities  lend  to  be  too  jagged  and  some 
times  even  broken  when  integration  with  intensii,  edges  u  not  used.  ||ou 
can  one  enforce  the  property  of  continuity  or  simply  collineantv  over  larger 
distances  within  the  MIIF  framework.’  1  lie  hasj(  idea  that  we  have  begun 
to  explore  is  to  have  a  higher  level  MRl  th.it  consists  of  ■•features",  such  as 
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straight  lines  of  different  orientations,  with  its  prior  probability  distribution, 
coupled  (bidirectionally)  with  the  line  process  lattice  (see  figure  8). 

Tolerance  in  registration  When  data  from  different  cues  are  combined, 
say  from  intensity  and  from  stereo,  they  must  be  registered.  Spatial  coinci¬ 
dence  is  the  main  constraint  exploited  here.  In  general,  however,  one  cannot 
expect  that  discontinuities  in  depth  and  intensity  will  always  have  exactly 
the  same  location.  Because  of  errors  in  the  early  vision  processes,  effects  of 
filtering,  photometric  effects  and  so  on,  depth  discontinuities  may  be  ofTset 
by  one  or  more  pixels  from  intensity  edges.  To  deal  with  this  registration 
problem  the  cone  of  influence  might  be  useful,  in  which  the  intensity  edges 
facilitate  (or  don’t  veto)  the  formation  of  depth  discontinuities.  The  cone  of 
influence  size  should  be  on  the  order  of  the  line  process  neighborhood.  In  this 
way  the  line  process  constraints  will  ensure  collinearity  within  the  cone-of- 
influence.  Again,  important  information  will  come  from  psychophysics:  we 
expect  to  learn  how  alignment  of,  for  instance,  intensity  edges  with  depth 
discontinuities  affects  human  vision. 

Learning  parameters  from  examples  A  critical  problem  in  using  MRFs 
is  the  problem  of  parameter  estimation.  The  performance  of  the  scheme 
depends  critically  on  the  natural  temperature  of  the  field,  the  potentials 
associated  with  the  clique  configurations,  the  coupling  between  the  lattices, 
and  so  on.  Parameter  estimation  should  provide  estimates  for  these  factors; 
possibly  by  learning  from  a  set  of  examples. 

Does  integration  influence  early  vision  modules?  In  our  computa¬ 
tional  approac  h  to  integration  we  have  tacitly  assumed  that  information  flows 
from  the  early  vision  modules  to  the  integration  stage  the  coupled  MRF 
system  but  not  backwards.  The  output  of  say,  stereo,  is  modified  by  the 
outputs  of  other  modules  at  the  level  of  the  MRFs  but  the  stereo  process 
it s<'] f  the  matching,  for  instance  -  is  not  affected.  The  decision  to  neglect 
feedback  interactions,  from  the  integration  stage  to  the  early  processes,  in  the 
present  version  of  our  theory  is  mainly  due  to  reasons  of  simplicity.  Without 
modifying  our  scheme  in  an  essential  way,  it  is  easy  to  incorporate  backward 
effects  from  the*  integration  stage  by  assuming  that  the  whole  process  from 
early  vision  algorithms  to  the  integration  stage  can  be  controlled  by  a  higher 


order  system  taking  into  account  higher-level  goals  and  the  available  results. 
If  recognition  is  the  goal,  for  instance,  the  current  results  of  the  recognition 
operation  on  the  integrated  information  can  control  which  early  processes  to 
apply,  where,  and  how  (i.e.  which  parameters  to  use).  In  this  case,  one  may 
hope  to  develop  a  useful  theory  of  integration  without  worrying  at  first  about 
the  problem  of  feedback. 

A  different  possibility  is  that  interactions  between  the  integration  stage' 
and  the  early  vision  modules  are  an  essential  part  of  any  integration  theory 
and  cannot  be  neglected  even  in  a  first-order  approximation.  In  an  extreme 
case  one  might  not  be  able  to  separate  the  integration  stage  usefully  from 
the  early  vision  modules  and  even  the  modules  one  from  another. 

In  principle,  this  is  possible.  The  algorithms  for  the  early  processes  can 
be  regarded  in  several  cases  as  MRFs  themselves  (regularization  algorithms 
are  special  cases  of  MRFs[2,23]).  Thus  our  coupling  schemes  for  integration 
can  be  extended  to  couple  the  early  processes.  In  practice,  we  expect  that 
parameter  estimation  may  become  a  very  serious  problem  once  the  early 
vision  processes  are  tightly  coupled. 

Hardware  implementations  As  discussed  elsewhere[  19,21]  the  coupled 
MRF  models  used  here  can  be  implemented  efficiently  in  mixed  digital  and 
analog  hybrid  networks.  It  is  interesting  that,  the  interaction  underlying 
coupling  between  fields  is  of  the  type  of  a  multiplicat  ion,  logical-ancl  or  veto 
operation.  These  operations  have  some  intriguing  possible  implementations 
in  terms  of  the  properties  of  synapses. 

While  it  is  certainly  possible  to  implement  the  same  mixed  deterministic 
and  stochastic  algorithms  described  here  in,  say,  VLSI  technologies,  it  is 
also  interesting  to  explore  approximative  deterministic  algorithms  that  may 
be  simpler  and  more  efficient.  Marroquin[l(>]  has  provided  an  encouraging 
initial  analysis  along  with  estimates  of  convergence  properties. 
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